DNA hypomethylation characterizes genes encoding tissue-dominant functional proteins in liver and skeletal muscle

Each tissue has a dominant set of functional proteins required to mediate tissue-specific functions. Epigenetic modifications, transcription, and translational efficiency control tissue-dominant protein production. However, the coordination of these regulatory mechanisms to achieve such tissue-specific protein production remains unclear. Here, we analyzed the DNA methylome, transcriptome, and proteome in mouse liver and skeletal muscle. We found that DNA hypomethylation at promoter regions is globally associated with liver-dominant or skeletal muscle-dominant functional protein production within each tissue, as well as with genes encoding proteins involved in ubiquitous functions in both tissues. Thus, genes encoding liver-dominant proteins, such as those involved in glycolysis or gluconeogenesis, the urea cycle, complement and coagulation systems, enzymes of tryptophan metabolism, and cytochrome P450-related metabolism, were hypomethylated in the liver, whereas those encoding-skeletal muscle-dominant proteins, such as those involved in sarcomere organization, were hypomethylated in the skeletal muscle. Thus, DNA hypomethylation characterizes genes encoding tissue-dominant functional proteins.


Supplementary Text
Identification of Different TF-bound Genes (DTGs) and the association between DMGs, DTGs, DEGs, DRPs, and DEPs (Supplementary Fig. 2, Supplementary Fig. 3) Gene expression is also regulated by transcription factors (TFs).TF-binding status and DNA hypomethylation have also been shown to be interrelated 1 .We retrieved peaks for each TF (liver: 90 TFs, skeletal muscle: 68 TFs) in the liver and skeletal muscle from ChIP-atlas Peak Browser 2 .
Using this TF-binding information, we examined the relationship of gene expression between DNA methylation and TF-binding status.
We examined the overlap between the TF-binding region and hypo-and hypermethylated CpGs (Supplementary Table 1).We identified TFs whose binding regions were significantly overlapped with the hypomethylated CpGs or the hypermethylated CpGs (right-tailed Fisher's exact test q < 0.01).The binding regions of all TFs except Cbx5 and Crtc in the liver significantly overlapped with hypomethylated CpGs (Supplementary Table 1), and no TFs overlapped with hypermethylated CpGs.All skeletal muscle TFs were significantly overlapped with hypomethylated CpG.This result suggests the possibility that the TFs preferentially bind to the hypomethylated regions, or that the TFs trigger DNA hypomethylation.
To compare the TFs between the liver and skeletal muscle, we used the TFs for which ChIPseq data exist in both liver and skeletal muscle (Brd4, Cebpb, Ctcf, Rest, Srf, Tcf3), TFs for which ChIP-seq is performed only in the liver and not expressed in skeletal muscle (Fox1, Nr0b2, Onecut1), or TFs for which ChIP-seq was performed only in skeletal muscle and not expressed in the liver (Fosl1, Myf5, Myod1, Myog, Pax3, Pax7) (Supplementary Table 2).When a TFbinding peak was present at 1000 bp upstream ~ 1000 bp downstream of a TSS of a gene, the TF was assumed to be bound to the gene.There are TFs known to trigger DNA hypomethylation or open chromatin.Among such TFs, Foxa1, Myod1, Ctcf, and Cebpb were identified in this analysis.
Foxa1 is a major TF involved in hepatocyte specification 3 , Myod1 1 is a major TF involved in muscle specification 4 .Ctcf and Cebpb are involved in chromatin remodeling 5,6 .The numbers DMGs bound by Ctcf, Foxa1, Cebpb, and Myod1 were 5,058 in the liver (83% of liver-dominant DMGs) and 1,920 in skeletal muscle (34% of skeletal muscle-dominant DMGs), respectively (Supplementary Fig. 2a-b).The binding of these TFs may trigger DNA hypomethylation of the DMGs.However, in some cases, methylation inhibited the binding of TFs such as Ctcf 7 .Further verification is needed to determine whether the binding of the TFs triggers DNA hypomethylation of the DMGs.
For each gene, we retrieved binding states of the TFs in the liver and skeletal muscle and calculated the scores of each TF-binding state based on the distribution of expression levels of the bound genes (see Methods for details).We defined genes in TF-binding status with different scores in the liver and skeletal muscle as Different TF-bound genes (DTGs).There were 3,645 (14%) DTGs activated in the liver, referred to as liver-dominant DTGs, and 6,936 (27%) DTGs activated in skeletal muscle, referred to as skeletal muscle-dominant DTGs (Supplementary Fig. 2c).
We performed enrichment analyses of the DT-DEPs (Supplementary Fig. 3e-f).The DTDEPs were enriched in pathways belonging to the "Metabolism" and "Organismal Systems" such as complement and coagulation in the liver, and cardiac muscle contraction in skeletal muscle.These pathways in which DT-DEPs were enriched were similar to those in which DM-DEPs were enriched.We classified DT-DEPs according to which TFs were bound (Supplementary Fig. 3e-f).
In summary, possible causes of differences in protein expression levels include differences in gene expression related to DNA hypomethylation, TF binding, translation and degradation.DEGs represent the differences in expression level (Fig. 3d).DMGs represent the difference in methylation (Fig. 3b).DTGs represent the differences in TF-binding (Supplementary Fig. 2c).DRPs represent differences in post-transcriptional regulation (Fig. 3h).Among the liver-dominant DEPs, 17% were DM-DEPs, 18% were DT-DEPs, and 7% were DR-DEPs (Supplementary Fig. 3c).Among the skeletal muscle-dominant DEPs, 13% were DM-DEPs, 35% were DT-DEPs, and 14% were DR-DEPs (Supplementary Fig. 3d).Therefore, among DEPs in both tissues, approximately 15% were DM-DEPs, 20% were DT-DEPs, and 10% were DR-DEPs.More than half of the DEPs are not DM-DEPs, DT-DEPs, nor DR-DEPs.These DEPs may be regulated by epigenomes other than DNA methylation or TFs for which there are no data in this study or may be remotely regulated by enhancers.d) Overlap of genes in hypermethylated state and genes with ATAC-seq peaks (e) For each gene, the difference in gene expression level (log2 (TPM in liver/TPM in skeletal muscle)) was plotted against the difference in methylation ratios between liver and skeletal muscle (log2 (methylation percentage in liver/skeletal muscle)).We considered three cases: hypomethylated (0 ~ 0.4196) genes in both liver and skeletal muscle, hypomethylated genes in one tissue but hypermethylated genes in the other, and hypermethylated genes in both tissues.The number of genes in each case is shown (genes with zero expression or methylation ratio are not included because the ratio cannot be obtained).(f) The region near the TSS was divided into 40 bp intervals, and the density of CpG in each region was determined as (CpG number in each region / length of each region).These CpG density vectors of the genes were divided into two clusters by hierarchical clustering using Euclidean distance and Ward's method.We examined the overlap between the genes belonging to each cluster and DMGs.

Figure 1 .
Supplemental explanation of our data, DMGs, DEGs, and DEPs.(a) Comparison of methylation ratios (200 bp upstream ~ 400 bp downstream of TSS) of each gene between C57BL6 mouse hepatocyte in this study and B6Ncrl mouse liver in ENCODE.(b) Relationship between gene expression and non-CpG (CHG or CHH; H: A, C, or T) methylation across gene regions.Plots show gene expression divided into deciles from low to high and methylation ratios of non-CpG sites for each of the indicated regions of genes.Correlations between expression and methylation ratio are presented as Spearman's rank correlation coefficient ρ.(c) Overlap of genes in hypomethylated state and genes with ATAC-seq peaks retrieved from ChIP-atlas (open chromatin).( (g) DMGs among genes with CpG islands.(h-i) Confirm overlap for combinations of highly expressed DEGs and highly methylated DMGs.(h) overlap between liver-dominant DEGs and skeletal muscle-dominant DMGs.(i) overlap between skeletal muscle-dominant DEGs and liverdominant DMGs.(j-k) Overlap for proteins encoded by DEGs and DEPs.(j) (left) overlap between liver-dominant DEGs and liver-dominant DEPs.(right) overlap between skeletal muscledominant DEGs and liver-dominant DMGs.(k) (left) overlap between skeletal muscle-dominant DEGs and skeletal muscle-dominant DEPs.(right) overlap between liver-dominant DEGs and skeletal muscle-dominant DEPs.(l) Venn diagrams show the overlap between non-DMGs and non-DEGs.(m) Venn diagrams show the relationship among non-DMGs, non-DEGs, and non-DEPs.Supplementary Figure 3. Overlaps of DMGs, DTGs, DEGs, DRPs, and DEPs.(a-b) The UpSet plot displays the relationships between three gene sets: DMGs, DTGs, and DEGs.The vertical axis shows the sets, and the horizontal axis shows the number of genes in each intersection.The squares under the UpSet plot columns indicate that the intersection corresponds to DM-DEGs or DT-DEGs (cyan: DM-DEGs, orange: DT-DEGs).(a) Liver-dominant, (b) Skeletal muscledominant, (c-d) The UpSet plot displays the relationships between five protein sets: proteins encoded by DMGs, proteins encoded by DTGs, proteins encoded by DEGs, DRPs, and DEPs.The vertical axis shows the sets, and the horizontal axis shows the number of proteins in each intersection.The squares under the upset plot columns indicate that the intersection corresponds to DM-DEPs, DT-DEPs, or DR-DEPs (cyan: DM-DEPs, orange: DT-DEPs, purple: DR-DEPs).(c) Liver-dominant, (d) Skeletal muscle-dominant.(e-f) The numbers of DM-DEPs, DT-DEPs, and DR-DEPs are shown in the heatmap (left).The number of DT-DEPs classified by the TFs is also shown (right); pathways in which DM-DEPs/DT-DEPs/DR-DEPs are enriched (q < 0.01 in the right-tailed Fisher's exact test) are marked with *.(e) Liver-dominant, (f) Skeletal muscledominant.(g) Overlaps between DR-DEPs and genes which were non-DMGs and non-DEGs.(metab.: metabolism, degrdn.: degradation, ER: endoplasmic reticulum, CYP: cytochrome P450, conv.: conversion, synth.: synthesis, metab.and cascades at the end of pathway names are omitted).(h-i) Overlaps between DR-DEPs and proteins with 5'-TOP motifs.(h) Liver-dominant, (i) Skeletal muscle-dominant.(j-k) Pie chart showing the proportion of differential histone modification states in liver or skeletal muscle.(j) H3K4me3, (k) H3K27ac, (l-m) The UpSet plot displays the relationships between five gene sets: DMGs, DTGs, DEGs, genes with different H3K4me3 states, and genes with different H3K27ac states.The vertical axis shows the sets, and the horizontal axis shows the number of proteins in each intersection.(l) Liver-dominant, (m) Skeletal muscle-dominant.